Self-representation based dual-graph regularized feature selection clustering

نویسندگان

  • Ronghua Shang
  • Zhu Zhang
  • Licheng Jiao
  • Chiyang Liu
  • Yangyang Li
چکیده

Feature selection algorithms eliminate irrelevant and redundant features, even the noise, while preserving the most representative features. They can reduce the dimension of the dataset, extract essential features in high dimensional data and improve learning quality. Existing feature selection algorithms are all carried out in data space. However, the information of feature space cannot be fully exploited. To compensate for this drawback, this paper proposes a novel feature selection algorithm for clustering, named self-representation based dual-graph regularized feature selection clustering (DFSC). It adopts the self-representation property that data can be represented by itself. Meanwhile, the local geometrical information of both data space and feature space are preserved simultaneously. By imposing the l2,1-norm constraint on the self-representation coefficients matrix in data space, DFSC can effectively select the most representative features for clustering. We give the objective function, develop iterative updating rules and provide the convergence proof. Two kinds of extensive experiments on some datasets demonstrate the effectiveness of DFSC. Extensive comparisons over several state-of-the-art feature selection algorithms illustrate that additionally considering the information of feature space based on self-representation property improves clustering quality. Meanwhile, because the additional feature selection process can select the most important features to preserve the intrinsic structure of dataset, the proposed algorithm achieves better clustering results compared with some co-clustering algorithms. & 2015 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Feature Extraction Based on Nonnegative Dual Graph Regularized Latent Low-Rank Representation

Aiming at the problem of gene expression profile's high redundancy and heavy noise, a new feature extraction model based on nonnegative dual graph regularized latent low-rank representation (NNDGLLRR) is presented on the basis of latent low-rank representation (Lat-LRR). By introducing dual graph manifold regularized constraint, the NNDGLLRR can keep the internal spatial structure of the origin...

متن کامل

Dual-graph regularized concept factorization for clustering

In past decades, tremendous growths in the amount of text documents and images have become omnipresent, and it is very important to group them into clusters upon desired. Recently, matrix factorization based techniques, such as Non-negative Matrix Factorization (NMF) and Concept Factorization (CF), have yielded impressive results for clustering. However, both of them effectively see only the gl...

متن کامل

Learning manifold to regularize nonnegative matrix factorization

In this chapter we discuss how to learn an optimal manifold presentation to regularize nonegative matrix factorization (NMF) for data representation problems. NMF, which tries to represent a nonnegative data matrix as a product of two low rank nonnegative matrices, has been a popular method for data representation due to its ability to explore the latent part-based structure of data. Recent stu...

متن کامل

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015